AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Adusumilli, Karun, Kasy, Maximilian, Wilson, Ashia

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

arXiv.org Machine LearningMar-24-2026

We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by $n$-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) $n$-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.

artificial intelligence, estimator, machine learning, (19 more...)

2603.20388

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsMar-23-2026, 11:03:09 GMT

Fast learning rates with heavy-tailed losses

Vu C. Dinh, Lam S. Ho, Binh Nguyen, Duy Nguyen

Neural Information Processing Systems http://nips.cc/

artificial intelligence, bernstein, machine learning, (18 more...)

Country: North America > United States > California (0.28)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Neural Information Processing SystemsFeb-8-2026, 13:16:06 GMT

40739b3bb584c117b3e2f418d17f63a1-Paper-Conference.pdf

loss function, perception error, perception system, (17 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.68)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
(2 more...)

Neural Information Processing SystemsFeb-7-2026, 15:04:38 GMT

15f99f2165aa8c86c9dface16fefd281-Supplemental.pdf

artificial intelligence, machine learning, risk function, (19 more...)

Genre: Contests & Prizes (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Zhang, Zhixian, Hou, Xiaotian, Zhang, Linjun

Unified Inference Framework for Single and Multi-Player Performative Prediction: Method and Asymptotic Optimality

arXiv.org Machine LearningFeb-4-2026

Performative prediction characterizes environments where predictive models alter the very data distributions they aim to forecast, triggering complex feedback loops. While prior research treats single-agent and multi-agent performativity as distinct phenomena, this paper introduces a unified statistical inference framework that bridges these contexts, treating the former as a special case of the latter. Our contribution is two-fold. First, we put forward the Repeated Risk Minimization (RRM) procedure for estimating the performative stability, and establish a rigorous inferential theory for admitting its asymptotic normality and confirming its asymptotic efficiency. Second, for the performative optimality, we introduce a novel two-step plug-in estimator that integrates the idea of Recalibrated Prediction Powered Inference (RePPI) with Importance Sampling, and further provide formal derivations for the Central Limit Theorems of both the underlying distributional parameters and the plug-in results. The theoretical analysis demonstrates that our estimator achieves the semiparametric efficiency bound and maintains robustness under mild distributional misspecification. This work provides a principled toolkit for reliable estimation and decision-making in dynamic, performative environments.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

2602.03049

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine (0.67)
Banking & Finance (0.45)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.65)

Jung, Yonghan, Kang, Bogyeong

Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding

arXiv.org Machine LearningJan-27-2026

We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full structural causal model specifications; or focus solely on population-level averages while neglecting covariate-conditional treatment effects. We overcome all four limitations simultaneously by establishing novel information-theoretic, data-driven divergence bounds. Our key theoretical contribution shows that the f-divergence between the observational distribution P(Y | A = a, X = x) and the interventional distribution P(Y | do(A = a), X = x) is upper bounded by a function of the propensity score alone. This result enables sharp partial identification of conditional causal effects directly from observational data, without requiring external sensitivity parameters, auxiliary variables, full structural specifications, or outcome boundedness assumptions. For practical implementation, we develop a semiparametric estimator satisfying Neyman orthogonality (Chernozhukov et al., 2018), which ensures square-root-n consistent inference even when nuisance functions are estimated using flexible machine learning methods. Simulation studies and real-world data applications, implemented in the GitHub repository (https://github.com/yonghanjung/Information-Theretic-Bounds), demonstrate that our framework provides tight and valid causal bounds across a wide range of data-generating processes.

artificial intelligence, machine learning, nullnull null, (18 more...)

2601.1716

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsDec-26-2025, 02:03:19 GMT

Surfing: Iterative Optimization Over Incrementally Trained Deep Networks

We investigate a sequential optimization procedure to minimize the empirical risk functional $f_{\hat\theta}(x) = \frac{1}{2}\|G_{\hat\theta}(x) - y\|^2$ for certain families of deep networks $G_{\theta}(x)$. The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters $\theta_0$, we show that the objective $f_{\theta_0}(x)$ is ``nice'' and easy to optimize with gradient descent. As learning is carried out, we obtain a sequence of generative networks $x \mapsto G_{\theta_t}(x)$ and associated risk functions $f_{\theta_t}(x)$, where $t$ indicates a stage of stochastic gradient descent during training. Since the parameters of the network do not change by very much in each step, the surface evolves slowly and can be incrementally optimized. The algorithm is formalized and analyzed for a family of expansive networks. We call the procedure {\it surfing} since it rides along the peak of the evolving (negative) empirical risk function, starting from a smooth surface at the beginning of learning and ending with a wavy nonconvex surface after learning is complete. Experiments show how surfing can be used to find the global optimum and for compressed sensing even when direct gradient descent on the final learned network fails.

gradient descent, iterative optimization, name change, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Neural Information Processing SystemsOct-3-2025, 09:16:45 GMT

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao

Meta-learning, or 'learning to learn' [ However, it can be any useful knowledge which can be learned in the meta-training stage. This setting is intuitive and theoretically sound.

arxiv preprint arxiv, maml, optimization, (13 more...)

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Ghosh, Prasenjit, Bhattacharya, Anirban, Pati, Debdeep

On Quantification of Borrowing of Information in Hierarchical Bayesian Models

arXiv.org Machine LearningSep-23-2025

In this work, we offer a thorough analytical investigation into the role of shared hyperparameters in a hierarchical Bayesian model, examining their impact on information borrowing and posterior inference. Our approach is rooted in a non-asymptotic framework, where observations are drawn from a mixed-effects model, and a Gaussian distribution is assumed for the true effect generator. We consider a nested hierarchical prior distribution model to capture these effects and use the posterior means for Bayesian estimation. To quantify the effect of information borrowing, we propose an integrated risk measure relative to the true data-generating distribution. Our analysis reveals that the Bayes estimator for the model with a deeper hierarchy performs better, provided that the unknown random effects are correlated through a compound symmetric structure. Our work also identifies necessary and sufficient conditions for this model to outperform the one nested within it. We further obtain sufficient conditions when the correlation is perturbed. Our study suggests that the model with a deeper hierarchy tends to outperform the nested model unless the true data-generating distribution favors sufficiently independent groups. These findings have significant implications for Bayesian modeling, and we believe they will be of interest to researchers across a wide range of fields.

estimator, phb, theorem 1, (16 more...)